Simulation-based optimization of Markov decision processes: An empirical process theory approach

نویسندگان

  • Rahul Jain
  • Pravin Varaiya
چکیده

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an -optimal policy from simulation. We provide sample complexity of such an approach. © 2010 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Robustness in portfolio optimization based on minimax regret approach

Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...

متن کامل

An empirical study on statistical analysis and optimization of EDM process parameters for inconel 718 super alloy using D-optimal approach and genetic algorithm

Among the several non-conventional processes, electrical discharge machining (EDM) is the most widely and successfully applied for the machining of conductive parts. In this technique, the tool has no mechanical contact with the work piece and also the hardness of work piece has no effect on the machining pace. Hence, this technique could be employed to machine hard materials such as super allo...

متن کامل

Configuration and adaptation of semantic web processes

In this paper, we present the METEOR-S framework for configuration and adaptation of Semantic Web processes. This paper shows how semantic descriptions of Web services can be used to facilitate Web process configuration and adaptation. For configuration, we present an approach that uses domain knowledge captured using ontologies, in conjunction with a well known optimization technique (Integer ...

متن کامل

Final Performance Report Grant FA

The researchers made significant progress in all of the proposed research areas. The first major task in the proposal involved simulation-based and sampling methods for global optimization. In support of this task, we have discovered two new innovative approaches to simulation-based global optimization; the first involves connections between stochastic approximation and our model reference appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Automatica

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2010